Alis Technologies UTF - 16 , an encoding of ISO 10646
نویسندگان
چکیده
The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly define a coded character set (CCS), hereafter referred to as Unicode, which encompasses most of the world’s writing systems [WORKSHOP]. UTF-16, the object of this specification, is a way to encode Unicode characters that has the characteristics of encoding the vast majority of currently-defined characters in exactly two octets and of being able to encode all other characters that will be defined in exactly four octets.
منابع مشابه
Internet Mail Consortium
The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly define a character set (hereafter referred to as Unicode) which encompasses most of the world’s writing systems. UTF-16, the object of this specification, is an encoding scheme of this character set that has the characteristics of encoding the vast majority of currently-defined characters in exactly two octets and of being ab...
متن کاملUTF-8, a transformation format of ISO 10646
ISO/IEC 10646-1 defines a multi-octet character set called the Universal Character Set (UCS) which encompasses most of the world’s writing systems. Multi-octet characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few so-called UCS transformation formats (UTF), each with different characteristics. UTF-8, the object of this...
متن کاملUsing Unicode with MIME
The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E) jointly define a 16 bit character set (hereafter referred to as Unicode) which encompasses most of the world’s writing systems. However, Internet mail (STD 11, RFC 822) currently supports only 7-bit US ASCII as a character set. MIME (RFC 1521 and RFC 1522) extends Internet mail to support different media types and character sets, an...
متن کاملUTF-7 - A Mail-Safe Transformation Format of Unicode
This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E) jointly define a 16 bit character set (hereafter referred to as Unicode) which encompasses most of the world's writing systems. However, Internet mail (S...
متن کاملAn Upper-Bound on Information Contained Within a Tweet
While tweets (and this paper) are limited to 140 characters, not all characters are created equal. This paper explores abuses of character encoding schemes to maximize the number of bits that can be conveyed by a tweet. In particular, since Twitter supports Unicode, we examine how we can abuse UTF8. For example, while people equate a Unicode codepoint with a character, some can be combined to f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999